A parallel butterfly algorithm
نویسندگان
چکیده
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform ∫ Rd K(x, y)g(y)dy at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most O(r2 N d p logN + (βr d p + α) log p) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively.
منابع مشابه
Parallel Algorithms for String Matching Problem Baesd on Butterfly Model
The string matching problem is one of the most studied problems in computer science. While it is very easily stated and many of the simple algorithms perform very well in practice, numerous works have been published on the subject and research is still very active. In this paper we propose a butterfly parallel computing model for parallel string matching. Experimental results show that, on a mu...
متن کاملASIC Design of Butterfly Unit Based on Non-Redundant and Redundant Algorithm
Fast Fourier Transform (FFT) processors employed with pipeline architecture consist of series of Processing Elements (PE) or Butterfly Units (BU). BU or PE of FFT performs multiplication and addition on complex numbers. This paper proposes a single BU to compute radix-2, 8 point FFT in the time domain as well as frequency domain by replacing a series of PEs. This BU comprises of fused floating ...
متن کاملConquering Edge Faults in a Butterfly with Automorphisms
Mapping an algorithm to an architecture with faults is an important problem in parallel processing. This paper deals with wrapped butterfly architectures with edge faults. We investigate the effect of automorphisms of a wrapped butterfly on its edges. Given a fault set, one can then choose an appropriate automorphism to map the algorithm to use only fault-free edges. By using an algebraic model...
متن کاملFast Parallel Sorting Algorithms on Gpus
This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. All algorithms have been impleme...
متن کاملCommunication-Efficient Distributed Stochastic Gradient Descent with Butterfly Mixing
Stochastic gradient descent is a widely used method to find locally-optimal models in machine learning and data mining. However, it is naturally a sequential algorithm, and parallelization involves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we explore butterfly mixing, where gradient...
متن کاملA GVT Based Algorithm for Butterfly Barrier in Parallel and Distributed Systems
Mattern’s GVT algorithm is a time management algorithm that helps achieve the synchronization in parallel and distributed systems. This algorithm uses ring structure to establish cuts C1 and C2 to calculate the GVT. The latency of calculating the GVT is vital in parallel/distributed systems which is extremely high if calculated using this algorithm. However, using synchronous barriers with the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- SIAM J. Scientific Computing
دوره 36 شماره
صفحات -
تاریخ انتشار 2014